0.1 Overview


This covers the basic hyperparameter tuning of the six models. In the cases where there are several hyperparameters, like for Random Forest and Gradient Boosting Tree, further tuning is required to ensure we have found the near optimal hyperparameters. It would not be computationally sensible to exhaustively search the hyperparameter space of the ensemble techniques. Instead the tuning could be done in two phases. The first phase would be more general and broad while the second phase would be search a finer scope of values, using the insights from the previous phase.

DUM - Dummy
GNB - GaussianNaiveBayes
GB - GradientBoosting
KNN - KNearestNeighbours
LSCM - LinearSVC
LG - LogisticRegression
RF - RandomForest
SVM - SupportVectorMachine

1 Logistic Regression


Logistic Regression (LG) was trained across three different hyperparemeters, each relating to regularisation.

1.1 C

1.2 Penalty

1.3 L1 Ratio

1.4 Best Hyperparameters for each Metric

2 K-Nearest Neighbours


K-Nearest Neighbours (KNN) was trained across only two hyperparameters:

2.1 Best Hyperparameters for each Metric

3 Support Vector Machine


A variety of Support Vector Machines (SVM) were trained on three hyperparemeters:

Note: Gamma only applies to a polynomial or RBF kernel, and scikit-learn offers two methods of determining the appropriate value through “Auto” and “Scale” - the maths behind the scenes should be described in the future.

3.1 C

3.2 Kernel

3.3 Gamma

3.4 Best Hyperparameters for Each Metric

4 Linear Support Vector Machine*


*Scikit-learn has two APIs for a SVM, the latter only supports a linear kernel but offers more methods of regularisation. It is also reported to be significantly quicker - by an order of magnitude - for larger datasets.

A Linear SVM (LSCM) does not take in hyperparameters kernel or gamma, since it is constrained to a linear kernel which does not have a gamma parameter. However, due to various technical issues, the penalty used was only L2 instead of including L1 and ElasticNet. The hyperparameters used were:

4.1 C

4.2 Loss

4.3 Best Hyperparameter for Each Metric

5 Random Forest


Random Forest (RF) is an ensembl technique trains several decision trees and aggregates across them to form a stronger predictor. RF has several hyperparameters to test, not all the parameters were selected as they are not all equally important. The selected few were:

5.1 Number of Trees

5.2 Max Depth

5.3 Min Samples Splits

5.4 Min Samples Leaf

5.5 Max Number of Features

5.6 Best Hyperparameters for Each Metric

6 Gradient Boosted Tree


Similar to Random Forest, Gradient Boosted Trees (GB) have several hyperparameters to tune. However, the same parameters for the RF were used here. GB also has an additional hyperparameter that determines how the previous generation of trees influence the current tree (i.e the learning rate).

6.1 Number of Trees

6.2 Max Depth

6.3 Min Samples Split

6.4 Min Samples Leaf

6.5 Max number of Features

6.6 Learning Rate

6.7 Best Hyperparameters for Each Metric